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ABSTRACT 

We study the problem of approximating the 3-profile of a 
large graph. 3-profiles are generalizations of triangle counts 
that specify the number of times a small graph appears as an 
induced subgraph of a large graph. Our algorithm uses the 
novel concept of 3-profile sparsifiers: sparse graphs that can 
be used to approximate the full 3-profile counts for a given 
large graph. Further, we study the problem of estimating lo¬ 
cal and ego 3-profiles, two graph quantities that characterize 
the local neighborhood of each vertex of a graph. 

Our algorithm is distributed and operates as a vertex 
program over the GraphLab PowerGraph framework. We 
introduce the concept of edge pivoting which allows us to 
collect 2 -hop information without maintaining an explicit 
2-hop neighborhood list at each vertex. This enables the 
computation of all the local 3-profiles in parallel with mini¬ 
mal communication. 

We test out implementation in several experiments scaling 
up to 640 cores on Amazon EC2. We find that our algorithm 
can estimate the 3-profile of a graph in approximately the 
same time as triangle counting. For the harder problem of 
ego 3-profiles, we introduce an algorithm that can estimate 
profiles of hundreds of thousands of vertices in parallel, in 
the timescale of minutes. 


Categories and Subject Descriptors 

G.2.2 [Graph Theory]: Graph Algorithms; C.2.4 [Distributed 
Systems] Distributed Applications 
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Figure 1: Subgraphs in the 3-profile of a graph. We 
call them (empty, edge, wedge, triangle). The 3- 
profile of a graph G connts how many times each of 
Hi appears in G. 
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1. INTRODUCTION 

Given a small integer k {e.g. fc = 3 or 4), the fe-profile 
of a graph G{V, E) is a vector with one coordinate for each 
distinct fc-node graph Hi (see Figure for fc = 3). Each 
coordinate counts the number of times that Hi appears as 
an induced subgraph of G. For example, the graph G — K 4 
(the complete graph on 4 vertices) has the 3-profile [0,0,0,4] 
since it contains 4 triangles and no other (induced) sub¬ 
graphs. The graph C 5 (the cycle on 5 vertices, i.e. a pen¬ 
tagon) has the 3-profile ]0, 5, 5,0]. Note that the sum of the 


fc-profile is always ('^^), the total number of subgraphs. 

One can see fe-profiles as a generalization of triangle (as 
well as other motif) counting problems. They are increas¬ 
ingly popular for graph analytics both for practical and 
theoretical reasons. They form a concise graph description 
that has found several applications for the web [4 25 , 


cial networks [^, and biological networks and seem 
to be empirically useful. Theoretically, they connect to the 
emerging theory of graph homomorphisms, graph limits and 
graphons iUlll]. 

In this paper we introduce a novel distributed algorithm 
for estimating the k = 3-profiles of massive graphs. In ad¬ 
dition to estimating the (global) 3-profile, we address two 
more general problems. One is calculating the local 3-profile 
for each vertex Vj. This assigns a vector to each vertex that 
counts how many times Vj participates in each subgraph Hi. 
These local vectors contain a higher resolution description of 
the graph and are used to obtain the global 3-profile (simply 
by rescaled addition as we will discuss). 












The second related problem is that of calculating the ego 
3-profile for each vertex Vj. This is the 3-profile of the graph 
N{vj) i.e. the neighbors of Vj, also called the ego graph of 
Vj. The 3-profile of the ego graph of Vj can be seen as a 
projection of the vertex into a coordinate system . This is 
a very interesting idea of viewing a big graph as a collection 
of small dense graphs, in this case the ego graphs of the 
vertices. Note that calculating the ego 3-profiles for a set of 
vertices of a graph is different (in fact, significantly harder) 
than calculating local 3-profiles. 

Contributions: Our first contribution is a provable edge 
sub-sampling scheme: we establish sharp concentration re¬ 
sults for estimating the entire 3-profile of a graph. This al¬ 
lows us to randomly discard most edges of the graph and still 
have 3-profile estimates that are provably within a bounded 
error with high probability. Our analysis is based on mod¬ 
eling the transformation from original to sampled graph as 
a one step Markov chain with transitions expressed as a 
function of the sampling probability. Our result is that a 
random sampling of edges forms a 3-profile sparsifier, i.e. a 
subgraph that preserves the elements of the 3-profile with 
sufficient probability concentration. Our result is a general¬ 
ization of the triangle sparsifiers by Tsourakakis et al. . 
Our proof relies on a result by Kim and Vu 15 on concentra¬ 


tion of multivariate polynomials, similarly to [^. Unfortu¬ 
nately, the Kim and Vu concentration holds only for a class 
of polynomials called totally positive and some terms in the 
3-profile do not satisfy this condition. For that reason, the 
proof of 38 does not directly extend beyond triangles. Our 


technical innovation involves showing that it is still possible 
to decompose our polynomials as combinations of totally 
positive polynomials using a sequence of variable changes. 

Our second innovation deals with designing an efficient, 
distributed algorithm for estimating 3-profiles on the sub¬ 
sampled graph. We rely on the Gather-Apply-Scatter model 
used in Graphlab PowerGraph M but, more generally, our 
algorithm fits the architecture of most graph engines. We 
introduce the concept of edge pivoting which allows us to 
collect 2-hop information without maintaining an explicit 
2-hop neighborhood list at each vertex. This enables the 
computation of all the local 3-profiles in parallel. Each edge 
requires only information from its endpoints and each vertex 
only computes quantities using data from incident edges. 
For the problem of ego 3-profiles, we show how to calculate 
them by combining edge pivot equations and local clique 
counts. 

We implemented our algorithm in GraphLab and per¬ 
formed several experiments scaling up to 640 cores on Ama¬ 
zon EG2. We find that our algorithm can estimate the 3- 
profile of a graph in approximately the same time as triangle 
counting. Specifically, we compare against the PowerGraph 
triangle counting routine and find that it takes us only 1%- 
10% more time to compute the full 3-profile. Eor the signifi¬ 
cantly harder problem of ego 3-profiles, we were able to com¬ 
pute (in parallel) the 3-profiles of up to 100, 000 ego graphs 
in the timescale of several minutes. We compare our parallel 
ego 3-profile algorithm to a simple sequential algorithm that 
operates on each ego graph sequentially and shows tremen¬ 
dous scalability benefits, as expected. Our datasets involve 
social network and web graphs with edges ranging in num¬ 
ber from tens of millions to over one billion. We present 
results on both overall runtimes and network communica¬ 
tion on multicore and distributed systems. 


2. RELATED WORK 

In this section, we describe several related topics and dis¬ 
cuss differences in relation to our work. 

Graph Sub-Sampling: Random edge sub-sampling is a 
natural way to quickly obtain estimates for graph parame¬ 
ters. For the case of triangle counting such graphs are called 
a triangle sparsifier s [38|. R elated ideas were explored in the 
Doulion algorithm |37| |38| with increasingly strong con¬ 
centration bounds. The recent work by Ahmed et al. de¬ 
velops subgraph estimators for clustering coefficient, triangle 
count, and wedge count in a streaming sub-sampled graph. 
Other recent work [3^ [Ts) uses random sampling 

to estimate parts of the 3 and 4-profile. These methods do 
not account for a distributed computation model and require 
more complex sampling rules. As discussed, our theoretical 
results build on to define the first 3-profile sparsifiers, 
sparse graphs that are a fortiori triangle sparsifiers. 
Triangle Counting in Graph Engines: Graph engines 
{e.g. Pregel, GraphLab, Galois, GraphX, see for a com¬ 
parison) are frameworks for expressing distributed compu¬ 
tation on graphs in the language of vertex programs. Tri¬ 
angle counting algorithms [31[ form one of the standard 
graph analytics tasks for such frameworks [9 30 . In [^, the 
authors list triangles efficiently, by partitioning the graph 
into components and processing each component in parallel. 
Typically, it is much harder to perform graph analytics over 
the MapReduce framework but some recent work 26 35 has 


used clever partitioning and provided theoretical guarantees 
for triangle counting. 

Matrix Formulations: Fast matrix multiplication has been 
used for certain types of subgraph counting. Alon et al. pro¬ 
posed a cycle counting algorithm which uses the trace of a 
matrix power on high degree vertices [^. Some of our edge 
pivot equations have appeared in [16[ |17| |40| , all in a cen¬ 
tralized setting. Related approximation schemes and 
randomized algorithms depend on centralized architec¬ 
tures and computing matrix powers of very large matrices. 
Frequent Subgraph Discovery: The general problem of 
finding frequent subgraphs, also known as motifs or sub¬ 
graph isomorphisms, is to find the number of occurrences of 
a small query graph within a larger graph. Typically fre¬ 
quent subgraph discovery algorithms offer pruning rules to 
eliminate false positives early in the search [41[ |19[ |10[ 

|29| . This is most applicable when subgraphs have labe 
vertices or directed edges. For these problems, the num¬ 
ber of unique isomorphisms grows much larger than in our 
application. 

In subgraphs were queried on the ego graphs of users. 
While enumerating all 3-sets and sampling 4-sets neighbors 
can be done in parallel, forming the ego subgraphs requires 
checking for edges between neighbors. This suggests that 
a graph engine implementation would be highly preferable 
over an Apache Hive system. Our algorithms simultaneously 
compute the ego subgraphs and their profiles, reducing the 
amount of communication between nodes. Our algorithm 
is suitable for both NUMA multicore and distributed archi¬ 
tectures, but our implementation focus in this paper is on 
GraphLab. 

Graphlets: 


led 


First described in 27 , graphlets generalize 


the concept of vertex degree to include the connected sub¬ 
graphs a particular vertex participates in with its neighbors. 
Unique graphlets are defined at a vertex based on its degree 
in the subgraph. Graphlet frequency distributions (GFDs) 




















In expectation, this yields the following linear system: 
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from which we obtain unbiased estimators for each entry 
in X(G) = [Xo, Xi,X 2 , Xs]: 
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Lemma 

1. X(G) 

is an unbiased estimator of n(G). 



Proof. By substituting if-® into it. clearly E [Xi\ = 
rii for i = 0,1, 2,3. □ 


have proven extremely useful in the field of bioinformatics. 
Specifically, GFD analysis of protein interaction networks 
helps to design improved generative models [11| , accurate 
similarity measures [^, and better features for classifica¬ 
tion [24[ |34| . Systems that use our edge pivot equations (in 
a different form) appear in prior literature for calculating 
GFDs |12[|22| but not for enabling distributed computation. 


3. UNBIASED 3-PROFILE ESTIMATION 

In this section, we are interested in estimating the number 
of 3 node subgraphs of type Hq, Hi, H 2 and H 3 , as depicted 
in Figure in a given graph G. Let the estimated counts 
be denoted Xi, i G {0,1,2,3}. Let the actual counts be 
rii, i € {0,1, 2, 3}. This set of counts is called a 3-proHle of 
the graph, denoted with the following vector: 

n3(G) = [no, ni, n 2 , ns]. (1) 

Because the vector is a scaled probability distribution, there 
are only 3 degrees of freedom. Therefore, we calculate 


n3(G) 



— ni — ns — ns, ni, ns, ns 


We now turn to prove concentration bounds for the above 
estimators. We introduce some notation for this purpose. 
Let X be a real polynomial function of m real random vari¬ 
ables Let a = (ai, as,. •., Om) £ E+ and define 

E>i[X] = maxa.||a||^>iE(9“X), where 


E(a“X) =E 


(A)-....,|x„. 


■ (7) 


Further, we call a polynomial totally positive if the coeffi¬ 
cients of all the monomials involved are non-negative. We 
state the main technical tool we use to obtain our concen¬ 
tration results. 


Proposition 1 (Kim-Vu Concentration [^). LeiX 
be a random totally positive Boolean polynomial in m Boolean 
random varzables tvith deyree at most k. IJ E[-^] ^ E>]_ 
then 


p(^|X-E[X]| > afc^E[X]E>i[X]A'= 

= G (exp (—A-I-(fc — 1) logm)) (8) 


In the case of a large graph, computational difficulty in es¬ 
timating the 3-profile depends on the total number of edges 
in the large graph. We would like to estimate each of the 
3-profile counts within a multiplicative factor. So we first 
sub-sample the set of edges in the graph with probability p. 
We compute all 3-prohle counts of the sub-sampled graph 
exactly. Let denote the exact 3-profile counts of the 

random sub-sampled graph. We relate the sub-sampled 3- 
profile counts to the original ones through a one step Markov 
chain involving transition probabilities. The sub-sampling 
process is the random step in the chain. Any specific sub¬ 
graph is preserved with some probability and otherwise tran¬ 
sitions to one of the other subgraphs. For example, a 3- 
clique is preserved with probability p®. Figure illustrates 
the other transition probabilities. 


for any A > 1, where a*, = 


The above proposition was used to analyze 3-profiles of 
Erdos-Renyi random ensembles (G„,p) in [^. Later, this 
was used to derive concentration bounds for triangle spar- 
sifiers in |38|. Here, we extend §4.3 of 15 to the 3-proHle 


estimation process, on an arbitrary edge-sampled graph. 


Theorem 1. (Generalization of triangle sparsifiers to 3- 
profile sparsifiers) Let n(G) = [no, ni, ns, ns] be the 3- 
profile of a graph G{V,E). Let \V\ = n and \E\ = m. Let 
n(G) = [Yb, Yi, Vs, Ys] be the 3-profile of the subgraph ob¬ 
tained by sampling each edge in G with probability p. Let 
a, P and A be the largest collection of Hi’s, wedges and tri¬ 
angles that share a common edge. Define X(G) according to 
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Figure 3: Unique 3-subgraphs from vertex perspec¬ 
tive (white vertex corresponds to v). 





Fo{v) Fi{v) F 2 {v) Filv) 

Figure 4: 4-subgraphs for Ego 3-profiles (white ver¬ 
tex corresponds to v). 


e > 0 , and 7 > 0 . If p,t satisfy: 
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(9) 


then ||X(G) — n(G)||^ < 12e(l^l) with probability at least 
1 - 

Proof. Full proof can be found in the appendix. Note 
that, as we mentioned, this proof uses a new polynomial 
decomposition technique to apply the Kim-Vu concentra¬ 
tion. □ 


The sampling probablilty p in Theorem depends poly- 
logarithmically on the number of edges and linearly on the 
fraction of each subgraph which occurs on a common edge. 
For example, if all of the wedges in G depend on a sin¬ 
gle edge, i.e. /3 = ng, then the last equation suggests the 
presence of that particular edge in the sampled graph will 
dominate the overall sparsifier quality. 


4. LOCAL 3-PROFILE CALCULATION 

In this section, we describe how to obtain two types of 3- 
profiles for a given graph G in a deterministic manner. These 
algorithms are distributed and can be applied independently 
of the edge sampling described in Section 

The key to our approach is to identify subgraphs at a 
vertex based on the degree with which it participates in the 
subgraph. From the perspective of a given vertex v, there 
are actually six distinct 3 node subgraphs up to isomorphism 
as given in Figure]^ Let no,„, ni_„, ng.D, and n 3 ,„ 

denote the corresponding local subgraph counts at v. We 
will first outline an approach that calculates these counts 
and then add across different vertex perspectives to calculate 
the final 4 scalars (ng,^ = n- 2 ,„-|-n 2 ^„ and ni^v = 

It is easy to see that the global counts can be obtained from 
these local counts by summing across vertices: 



4.1 Distributed Local 3-profile 

We will now give our approach for calculating the local 3- 
profile counts of G(V, E) using only local information com¬ 
bined with \ V\ and \E\. 


Scatter: We assume that every edge (u, a) has access to 
the neighborhood sets of both v and a, i.e. r(ii) and r(g). 
Therefore, intersection sizes are first calculated at every 
edge, i.e. |r(u) n r(g)|. Each edge computes the follow¬ 
ing scalars and stores them: 

n3,.a = |r(u) n r(g)|, = |r(u)| - |r(u) nr(g)| - i. 

n2,.a = |r(g)|-|r(u)nr(g)|-i. 

ni,.. = |U| - (|r(u)| + |r(g)| - |r(u) n r(g)|). (ii) 

The computational effort at every edge is at most G(dmax), 
where dmax is the maximum degree of the graph, for the 
neighborhood intersection size. 

Gather: In the next round, vertex v “gathers” the above 
scalars in the following way: 
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Here, relations (a) and (b) are because triangles and wedges 
from center are double counted, (c) comes from noticing 
that each triangle and wedge from endpoint excludes an ex¬ 
tra edge from forming Hf{v). In this gather stage, the com¬ 
munication complexity is 0{M) where it is assumed that 
r(u) is stored over M different machines. The correspond¬ 
ing distributed algorithm is described in Algorithm 

4.2 Distributed Ego 3-profile 

In this section, we give an approach to compute ego 3- 
profiles for a set of vertices V C U in G. For each vertex 
V, the algorithm returns a 3-profile corresponding to that 
vertex’s ego N{v), a subgraph induced by the neighborhood 
set r(u), including edges between neighbors and excluding 
V itself. Formally, our goal is to compute {n(A(i;))}„gv- 
Clearly, this can be accomplished in two steps repeated se¬ 
rially on all u G V: first obtain the ego subgraph N{v) and 
then pass as input to Algorithm summing over the ego 
vertices r(i;) to get a global count. The serial implemen¬ 
tation is provided in Algorithm We note that this was 
essentially done in , where ego subgraphs were extracted 
from a common graph separately from 3-profile computa¬ 
tions. 

Instead, Algorithm provides a parallel implementation 
which solves the problem by finding cliques in parallel for 
all u G V. The main idea behind this approach is to realize 
that calculating the 3-profile on the induced subgraph N{v) 
is exactly equivalent to computing specific 4-node subgraph 




























frequencies among v and 3 of its neighbors, enumerated as 
Fi{v), 0 < i < 3 in Figure]^ Now, the aim is to calculate 
Fi{v)’s, effectively part of a local 4-prohle. 

Scatter: We assume that every edge (v, a) has already com¬ 
puted the scalars from ( |11[ ). Additionally, every edge (v,a) 
also computes the list JVva = r(v) D r(a) instead of only its 
size. The computational complexity is still Cl(cimax). 
Gather: First, the vertex “gathers” the following scalars, 
forming three edge pivot equations in unknown variables 

F{vy. 

aGr{'i;) \ / 

aGr{'i;) \ / 

E ^2,van3,va = 2Fl{v) + 2F2iv) 
aGr(v) 

By choosing two subgraphs that the edge (u, a) partici¬ 
pates in, and then summing over neighbors a, these equa¬ 
tions gather implicit connectivity information 2 hops away 
from V. However, note that there are only three equa¬ 
tions in four variables and we must count one of them di¬ 
rectly, namely the number of 4-cliques ^ 3 ( 11 ). Therefore, 
at the same gather step, the vertex also creates the list 
CAfv = UaGr(u) peA/ta(®’P)' Essentially, this is the list of 
edges in the subgraph induced by r(u). This requires worst 
case communication proportional to the number of edges in 
N{v), independent of the number of machines M. 

Scatter: Now, at the next scatter stage, each edge {v, a) ac¬ 
cesses the pair of lists CNv,CNa- Each edge (v, a) computes 
the number of 4-cliques it is a part of, defined as follows: 

'^4,va ~ E i((i,i) eCAf„). (14) 

i,jer(4;)nr(a) 

This incurs a computation time of ICAf^l. 

Gather: In the hnal gather stage, every vertex v accu¬ 
mulates these scalars to get F 3 (v) = | X] ’T- 4 ,Da requir- 

aGr(4;) 

ing 0(M) communication time. As in the previous section, 
the scaling accounts for extra counting. Finally, the vertex 
solves the equations ( |13| l using ^ 3 ( 11 ). 

5. IMPLEMENTATION AND RESULTS 

In this section, we describe the implementation and the 
experimental results of the 3-prof, Ego-par and Ego-SER 
algorithms. We implement our algorithms on GraphLab 
v2.2 (PowerGraph) [^. The performance (running time and 
network usage) of our 3-prof algorithm is compared with 
the Undirected Triangles Count Per Vertex (hereinafter re¬ 
ferred to as trian) algorithm shipped with GraphLab. We 
show that in time and network usage comparable to the 
built-in TRIAN algorithm, our 3-prof can calculate all the 
local and global 3-prohles. Then, we compare our paral¬ 
lel implementation of the ego 3-profile algorithm, Ego-par, 
with the naive serial implementation, Ego-SER. It appears 
that our parallel approach is much more efficient and scales 
much better than the serial algorithm. The sampling ap¬ 
proach, introduced for the 3-prof algorithm, yields promis¬ 
ing results - reduced running time and network usage while 


still providing excellent accuracy. We support our findings 
with several experiments over various data sets and systems. 
Vertex Programs: Our algorithms are implemented using 
a standard GAS (gather, apply, scatter) model [^. We im¬ 
plement the three functions gather(), apply(), and scat¬ 
ter 0 to be executed by each vertex. Then we signal subsets 
of vertices to run in a specific order. 


Algorithm 1 3-prof 

Input: Graph G(V, E) with \V\ vertices, \E\ edges 
Gather: For each vertex v union over edges of the ‘other’ 
vertex in the edge, Uagr{D)a = r(u). 

Apply: Store the gather as vertex data v.nb, size auto¬ 
matically stored. 

Scatter: For each edge e-ua, compute and store scalars in 

(^. 

Gather: For each vertex v, sum edge scalar data of neigh¬ 
bors 


g •*- I]{„,a)er(«) ® • data. 

Apply: For each vertex v, calculate and store the quan¬ 
tities described in (12l. 
return [v: v.nO v.nl v.n2 v.n3] 


Algorithm 2 Ego-SER 

Input: Graph G{V,E) with \V\ vertices, \E\ edges, set of 
ego vertices V 

for u G V do 

Signal V and its neighbors. 

Include an edge if both its endpoints are signaled. 

Run Algorithmj^on the graph induced by the neighbors 
and edges between them, 
end for 

return [v: vego.nO vego.nl vego.n2 vego.nS] 


The Systems: We perform the experiments on three sys¬ 
tems. The first system is a single power server, further re¬ 
ferred to as Asterix. The server is equipped with 256 GB of 
RAM and two Intel Xeon E5-2699 v3 CPUs, 18 cores each. 
Since each core has two hardware threads, up to 72 logical 
cores are available to the GraphLab engine. 

The next two systems are EC2 clusters on AWS (Ama¬ 
zon Web Services) [^. One is comprised of 12 m3.2xlarge 
machines, each having 30 GB RAM and 8 virtual CPUs. 
Another system is a cluster of 20 c3.8xlarge machines, each 
having 60 GB RAM and 32 virtual CPUs. 

The Data: In our experiments we used hve real graphs. 
These graphs represent different datasets: social networks 
(LiveJournal and Twitter), citations (DBLP), knowledge con¬ 
tent (Wikipedia), and WWW structure (PLD - pay level 
domains). Graph sizes are summarized in Table 


Table 1: Datasets 


Name 

Vertices 

Edges (undirected) 

Twitter 
PLD 23 
Live Jour 
Wikiped 
DBLP 2 

18 

nal 20 
a fs 

0 

41,652,230 
39,497,204 
4, 846, 609 
3,515,067 
317,080 

1,202,513,046 

582,567,291 

42,851,237 

42,375,912 

1,049,866 




























Algorithm 3 Ego-par 

Input: Graph G{V,E) with \V\ vertices, |-E| edges, set of 
ego vertices V 

Gather: For each vertex v union over edges of the ‘other’ 
vertex in the edge, Ue„„a = r(?;). 

Apply: Store the gather as vertex data v.nb, size auto¬ 
matically stored. 

Scatter: For each edge Cva, compute and store as edge 
data: 

Scalars in ©■ 

The list Afva- 

Gather: For each vertex v, sum edge data of neighbors: 
Acumulate LHS of ([Tsf. 
g.CAf ^ g.CN 'J Nva- 

Apply: Obtain CMv and equations in ( |13[ ) using the 
scalars and g.CN■ 

Scatter: Scatter CM v,CM a to all edges (v,a). 

Compute n 4 ,va as in |T4| ). 

Gather: Sum edge data 714 ,of neighbors at v. 

Apply: Compute Fs^v). 

return [v: vego.nO vego.nl vego.n2 vego.nS] 


5.1 Results 

Experimental results are averaged over 3 — 10 runs. 

Local 3-profile vs. triangle count: The hrst result is 
that our 3-prof is able to compute all the local 3-pro£les 
in almost the same time as the GraphLab’s built-in trian 
computes the local triangles (i.e., number of triangles in¬ 
cluding each vertex). Let us start with the first AWS cluster 
with less powerful machines (m3.x21arge). In Figure (a) 
we can see that for the LiveJornal graph, for each sampling 
probability p and for each number of nodes (i.e., machines 
in the cluster), 3-prof achieves running times comparable 
to TRIAN. Notice also the benefit in running time achieved 
by sampling. We can reduce running time almost by half, 
without significantly sacrificing accuracy (which will be dis¬ 
cussed shortly). While the running time is decreased as the 
number of nodes grows (more computing resources become 
available), the network usage becomes higher (see Figure]^ 
(c)) due to the extensive inter-machine communication in 
GraphLab. We can also see that sampling can significantly 
reduce network usage. In Figures [^(b) and (d), we can see 
similar behavior for the Wikipedia graph: running time and 
network usage of 3-prof is comparable to trian. 

Next, we conduct the experiments on the second AWS 
cluster with more powerful (c3.8xlarge) machines. For Live- 
Journal, we note modest improvements in running time for 
nearly the same network bandwidth observed in Figure 
On this system we were able to run 3- PROF and trian on the 
much larger PLD graph. In Figures [^(b) and (d) we com¬ 
pare the running time and network usage of both algorithms. 
For the large PLD graph, the benefit of sampling can be seen 
clearly; by setting p — 0.1, the running time of 3-prof is 
reduced by a factor of 4 and the network usage is reduced 
by a factor of 2. Figure shows the performance of 3-prof 
and trian on the LiveJournal and Wikipedia graphs. We 
can see that the behavior of running times and the network 
usage of the 3-prof algorithm is consistently comparable 
to TRIAN across the various graphs, sampling, and system 
parameters. 

Let us now show results of the experiments performed on 


a single powerful machine (Asterix). Figure [TT| (a) shows the 
running times for 3-prof and trian for Twitter and PLD 
graphs. We can see that on the largest graph in our dataset 
(Twitter), the running time of 3-prof is less than 5% larger 
than that of trian, and for the PLD graph the difference 
is less than 3% (for p = 1). Twitter takes roughly twice 
as long to compute as PLD, implying that these algorithms 
have running time proportional to the graph’s number of 
edges. 

Finally, we show that while the sampling approach can 
significantly reduce the running time and network usage, it 
has negligible effect on the accuracy of the solution. No¬ 
tice that the sampling accuracy refers to the global 3-profile 
count (i.e., the sum of all the local 3-proHles over all vertices 
in a graph). In Figurewe show accuracy of each scalar in 
the 3-profile. For the accuracy metrics, we use ratio between 
the exact count (obtained running 3-prof with p = 1) di¬ 
vided by the estimated count (i.e., the output of our 3-prof 
when p < 1). It can be seen that for the three graphs, all the 
3-proHles are very close to 1. E.g., for the PLD graph, even 
when p = 0.01, the accuracy is within 0.004 from the ideal 
value of 1. Error bars mark one standard deviation from the 
mean, and across all graphs the largest standard deviation 
is 0.031. As p decreases, the triangle estimator suffers the 
greatest loss in both accuracy and consistency. 



(a) 


(b) 




(c) 


(d) 


Figure 5: AWS in3_2xlarge cluster. 3-prof vs. trian algo¬ 
rithms for LiveJournal and Wikipedia datasets (average 
of 3 runs). 3-prof achieves comparable performance to 
triangle counting. (a,b) — Running time for various num¬ 
bers of nodes (machines) and various sampling probabil¬ 
ities p. (c,d) — Network bytes sent by the algorithms for 
various numbers of nodes and various sampling proba¬ 
bilities p. 


Ego 3-profiles: The next set of experiments evaluates the 
performance of our Ego-par algorithm for counting ego 3- 
profiles. We show the performance of Ego-par for vari¬ 
ous graphs and systems and also compare it to a naive se¬ 
rial algorithm Ego-SER. Let us start with the AWS sys¬ 
tem with (c3.8xlarge machines). In Figure we see the 
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Figure 6: AWS c3_8xlarge cluster. 3-prof vs. trian algo¬ 
rithms for LiveJournal and PLD datasets (average of 3 
runs). 3-prof achieves comparable performance to trian¬ 
gle counting. (a,b) — Running time for various numbers 
of nodes (machines) and various sampling probabilities p. 
(c,d) — Network bytes sent by the algorithms for various 
numbers of nodes and various sampling probabilities p. 


running time of Ego-SER and Ego-par on the LiveJournal 
graph. The task was to find ego 3-profiles of 100, IK, and 
lOK randomly selected nodes. Since the running time de¬ 
pends on the size and structure of each induced subgraph, 
Ego-Ser and Ego-par operated on the same list of ego ver¬ 
tices. While for 100 random vertices Ego-SER performed 
well (and even achieved the same running time as Ego-par 
for the PLD graph), its performance drastically degraded 
for a larger number of vertices. This is due to its iterative 
nature - it finds ego 3-profiles of the vertices one at a time 
and is not scalable. Note that the open bars mean that this 
experiment was not finished. The numbers above them are 
extrapolations, which are reasonable due to the serial design 
of the Ego-SER. 

On the contrary, the Ego-par algorithm scales extremely 
well and computes ego 3-profiles for 100, IK, and lOK ver¬ 
tices almost in the same time. In Figure!^ (a), we can see 
that as the number of nodes (i.e., machin^ increases, run¬ 
ning time of Ego-par decreases since its parallel design al¬ 
lows it to use additional computational resources. However, 
Ego-SER cannot benefit from more resources and its running 
time even increases when more machines are used. The in¬ 
crease in running time of Ego-SER is due to the increase in 
network usage when using more machines (see Figure[^(b)). 
The network usage of Ego-par also increases, but this algo¬ 
rithm compensates by leveraging additional computational 
power. In Figure |10[ we can see that Ego-par performs 
well even when finding ego 3-profiles for all the LiveJournal 
vertices (4.8M vertices). 

Finally in Figure [TT| (b) and (c), we can see the comparison 
of Ego-par and Ego-Ser on the PLD and the DBLP graphs 
on the Asterix machine. For both graphs, we see a very 



(a) 


(b) 


Figure 7: AWS c3_8xlarge cluster with 20 nodes. 3-prof 
vs. trian results for LiveJournal and Wikipedia datasets 
(average of 3 runs), (a) — Running time for both graphs 
for various sampling probabilities p. (b) — Network bytes 
sent by the algorithms for both graphs for various sam¬ 
pling probabilities p. 




(a) (b) 

Figure 8: AWS c3_8xlarge cluster. Ego-par vs. Ego-ser 
results for LiveJournal and PLD datasets (average of 5 
runs). Running time of Ego-par scales well with the num¬ 
ber of ego centers, while Ego-ser scales linearly. 


good scaling of Ego-par, while the running time of Ego-SER 
scales linearly with the size of the ego vertices list. 

6. CONCLUSIONS 

In summary, we have reduced several 3-profile problems 
to triangle and 4-clique finding in a graph engine framework. 
Our concentration theorem and experimental results confirm 
that local 3-profile estimation via sub-sampling is compara¬ 
ble in runtime and accuracy to local triangle counting. 

This paper offers several directions for future work. First, 
both the local 3-profile and the ego 3-profile can be used as 
features to classify vertices in social or bioinformatic net¬ 
works. Additionally, we hope to extend our theory and al¬ 
gorithmic framework to larger subgraphs, as well as special 
classes of input graphs. Our edge sampling Markov chain 
and unbiased estimators should easily extend to fc > 3. 
Equations in are useful to count local or global 4-profiles 
in a centralized setting, as shown recently in [17[|40] . Tractable 
distributed algorithms for fe > 3 using similar edge pivot 
equations remain as future work. Our observed dependence 
on 4-clique count suggests that an improved graph engine- 
based clique counting subroutine will improve the parallel 
algorithm’s performance. 



















































































































































































































(a) 


(b) 


(c) 


Figure 9: AWS c3_8xlarge cluster. Ego-par vs. Ego-ser results for LiveJournal and Wikipedia datasets (average of 5 
runs). Running time of Ego-par decreases with the number of machines due to its parallel design. Running time of 
Ego-ser does not decrease with the number of machines due to its iterative nature. Network usage increases for both 
algorithms with the number of machines. 


Twitter and PLD, Asterix PLD, Asterix DBLP, Asterix 



(a) (b) (c) 


Figure 11: Asterix machine. Results for Twitter, PLD, and DBLP datasets, (a) — Running time of 3-prof vs. trian 
for various sampling probabilities p. (b,c) — Running time of Ego-par vs. Ego-ser for various number of ego centers. 
Results are averaged over 3, and 3, and 10 runs, respectively. 



Figure 10: AWS c3_8xlarge cluster with 20 nodes. 
Ego-par results for LiveJournal dataset (average of 5 
runs). The algorithm scales well for various number of 
ego centers and even full ego centers list, (a) — Running 
time, (b) — Network bytes sent by the algorithm. 
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APPENDIX 

A. PROOF OF THEOREMS] 

Let m be the total number of edges in the original graph 
G. If e is an edge in the original graph G, let te be the 
random indicator after sampling, te = 1 if e is sampled and 
0 otherwise. Let denote the set of distinct 

subgraphs of the kind Hq, Hi, H 2 and H 3 (anti-clique, edge, 
wedge and triangle) respectively. Let A, _(e), A(e,/) and 
A{e,f,g) denote an anti-clique with no edges, a Hi with 
edge e , a H 2 with two edges e, / and a triangle with edges 
e, /, g respectively in the original graph G. Our estimators 
are a function of Yfs and each Y can be written as 
a polynomial of at most degree 3 in all the variables t^. 

Yq = no A ^ ] (1 — te) -|- ^ ] (1 — te){l — tf)A 

-(e)eni A(e,/)eH 2 

^ {l-c){l-tg){l-t,) (15) 

A(e,/.9)6H3 

Yl = ^ te + ^ {{1 — te)tf + {1 — tf)te) + 

_(e)6Hl A(e,/)eH2 

^ U{l-tf){l-ti,) + 

AieJ'.gje'Hs 

E tfil-E){l-t,)+t,{l-C)il-tf) (16) 

A(e,/.9)6W3 

Y2 = E 

A{e,/)e «2 

E - ^9) + ^7^9(1 - ie) + ^=(1 - ifYa 


£^(e,f,g)en 3 

(17) 

Ys = E ^=^7^9 (18) 

A{e,/,9)g'H3 

Si= E (19) 

-(e)6Wl 

Di= E + (20) 

A{e,/)eW 2 

D 2 = E *=^7 (21) 

A{e,/)e «2 

Ti = E (le + ^7 + 19 ) (22) 

A{e,/,9)6'H3 

T2= E {Htf+tftg+tgC) (23) 

A{e./,9)6^3 

Yi = Si + Di- 2 D 2 +Ti- 2 T 2 + 3y3 (24) 

Y 2 ^D 2 +T 2 - 3y3 (25) 


Note that the newly defined polynomials have the following 
expectations: 

E]^!] = pni 
E[Di] = 2pn2 
E[H2] = P^n2 
E[Ti] = Spns 
E[T2] = Sp^ns- 


We observe that in the above even by change of variables 
t/e = (1 — ie), Yl and Y 2 are not totally positive polyno- 


mials. This means that Proposition cannot be applied 
directly to the T/s or Xi’s. The strategy we adopt is to 
split the Vi and Y 2 into many polynomials, each of which 
is totally positive, and then apply Proposition on each of 
them. P = {Vb, P 3 , S'!, Di, T> 2 , Ti, 22 } form the set of to¬ 
tally positive polynomials (proved below). Substituting the 
above equations into we have the following system 

of equations that connect Xi’s and the set of totally positive 
polynomials P: 

(Si -f Pi -f Ti - 2P2 - 2T2 + 3X3) 

P 


Xa = Yn- 


{l-pf 


= Yo- 


pZ 

1-p 


(P 2 +T 2 — 3 Y 3 ) — 


(1-p)' 


^3 


(Si +Di+ Ti) 


1-p" 


(P2 + T2) — 


1-p^ 




(26) 


Xi = - (Si + Di + n- 2D2 - 2T2 + 3Y3) 

- (^2 + ^2 - 3 ^ 3 ) + 

pZ pC5 

= -(Si -f Pi + Tl) -^{D2+ T2) + -^^ 3 . 

p P‘^ 


X 2 — —^ {D 2 +T 2 — SYs) — 

pZ 

= \{D2+T2)- ^Ys. 

pZ pC5 

X 3 = 4i^3. 


3(1-P) 


X 3 


(27) 

(28) 
(29) 


Let Oe, /3e, and Ae be the maximum number of Pi’s, 
H 2 ’s, and Hs’s containing an edge e in the original graph G. 
Let a,/3 and A be the maximum of ae,/3e, and Ae over all 
edges e. We now show concentration results for the totally 
positive polynomials alone. 

Lemma 2. Define variables ye = 1 — ts. Then Yq is totally 
positive in ye- With respect to the variables ye, E>i [To] < 
3max{a, fi, A}. 

Proof. We have the expectation of the following partial 
derivatives, up to the third order: 


E 


E 


'dYo' 

_dye 

— tte H" (1 “ P)^e (1 

-pf Ae 


< 3max{Qe, fie, Ae}. 


dYo 

< 1 -1- (1 -p) < 2, E 

\ dYo 

dy^VS _ 

dyeVfyg 


< 1 . 


From the above equations, we have E>i [Vi] < 3 max{a, fi, A} 
for a nonempty graph. □ 

To satisfy E>i [Yb] < E[yo], it is sufficient to have 

no > 3max{a,/J, A}. (30) 

This is because Yq > no with probability 1. 

Lemma 3. Y3 is totally positive in te- With respect to the 
variables te, E>i [Yb] < max{l,p^A}. 

Proof. We have the expectation of the following partial 
derivatives, up to the third order: 


E 


OYq 


dte 


= P^ Ae, E 


dYs 

dtetf 


= p < 1, E 


dYa 


dtet ftg 


< 1 . 


From the above equations, wehaveE>i [Yb] < max{l,p^A}. □ 
E>i [Yb] < E[Y3] implies 

p > max{^^. A/no}. (31) 

Lemma 4. Si is totally positive in te- With respect to the 
variables te, E>i [Si] < a. 

Proof. We have the expectation of the following partial 
derivatives, up to the second order: 


E 


'dSi' 

= Qle, IE 

■ 9Si ■ 

dte 


dtetf 


= 0 . 


From the above equations, we have E>i [Si] < a . □ 

E>i [Si] < E[S 3 ] implies 

p > a/ni. (32) 

Lemma 5. Pi is totally positive in te- With respect to 
the variables te, E>i [Pi] < fi. 

Proof. We have the expectation of the following partial 
derivatives, up to the second order: 


E 


'dDi 

= fie, E 

■ aPi ■ 

dte 


dtetf 


= 0 . 


From the above equations, we have E>i ]Pi] < fi . □ 

E>i ]Pi] < E]Pi] implies 

p > P/{2n2). (33) 

Lemma 6. Ti is totally positive in te- With respect to the 
variables te, E>i ]Ti] < A. 

Proof. We have the expectation of the following partial 
derivatives, up to the second order: 


E 


'dTi' 

— Ae, E 

■ dTi ■ 

dte 


dtetf 


= 0 . 


From the above equations, we have E>i [Ti] < A . □ 

E>i [Tl] < E]Ti] implies 

p > A/(3n3). (34) 

Lemma 7. P2 is totally positive in te- With respect to 
the variables te, E>i [P 2 ] < max{p/3,1}. 

Proof. We have the expectation of the following partial 
derivatives, up to the second order: 


E 


dD2 


dte 


= pPe, E 


dD2 


dtetf 


< 1 . 


From the above equations, we have E>i ]P 2 ] < max{p/3,1}. □ 

E>i ]P 2 ] < E]P 2 ] implies 

p > max{/3/n2, ^=}. (35) 

Vn2 

Lemma 8. T 2 is totally positive in te- With respect to the 
variables te, E>i ]T 2 ] < max{2pA, 1}. 






















































Proof. We have the expectation of the following partial 
derivatives, up to the second order: 


E 


dn 

dte 


= 2pAe, E 


dTi 


dtetf 


< 1 . 


From the above equations, we have E>i [T 2 ] < max{2pA, 1}. □ 

E>i [T 2 ] < E[r 2 l implies 

1 


p > max{ 2 A/( 3 n 3 ), 


%/3n3 


}• 


Now merging all the conditions (|30|-([36|, we get 

no > 3 max{a, P, A} 

1 1 A p a 

p > max{^—, —, —, — 

^n2 ns ns ni 


(36) 


(37) 


Applying Proposition to all the totally positive polyno¬ 
mials, along with (|37[), we get 


¥(^\Yo- E[yo]| > asy^EiyolE^iiPolA? 

= O (exp (—Ai -I- (2) log m)) 
p (^lyj - E[y3]| > a3y'E[y3]E>i[y3]Ai^ 

= O (exp (—A 2 -I- (2) log m)) 

P(^|5i -El^i]! > aiy'ElSiJE^ilS'ilAs^ 

= C>(exp(-A3)) 

P -E[Di]| > ai^E[Di]E>ipi]A4^ 

= O (exp (-A4)) 

P (^\Ti-E[Ti]\ > ai^E[Ti]E>i[Ti]A5^ 

= O (exp (-As)) 

P (^\D2-E[D2]\ > a2-yE[D2]E>i[D2lA^^ 

= O (exp (—Aa -I- log m)) 

P ^|r2 -E[T2]| > a2^E[r2]E>i[T2]A?^ 

= O (exp (—At + log m)). 

Choose an e > 0. We force the following conditions: 
a3^E[yo]E>i[yo]A? = £E[yol 
= eEiyj] 

ai^E[5i]E>i[Si]A3 = eE[Si] 
®1 '\/e[Di]E>i[Z)i]A 4 = eE[Di] 
oi •^E[Ti]E>i [Ti] As = eE[Ti] 
ussJe [D2]E> 1 [Ds] Aa = eE[D2] 
a2'^E[r2]E>i[T2]A7 = eE[r2]. 


(38) 


(39) 


Let 7 > 0. For the right hand side of every equation in 


(38l to be Cl(exp(—qlogm)), assuming all the bounds in 


Lemmas |2|8[ it is sufficient to have 

ailog® (m 2 +^) 


no 


3max{Q:, P, A} 


> 


max{^. A/ns} 


> 


> 




max{ ^, —3= } 

I- n2 ’ 3n3 ’ ^/n2 ’ ^ 


ailog® (m 2 +^) 


al log^ (m^) 


al log'* (m^+^) 


(40) 


We can see that the conditions in (401 imply the conditions 


in ( |37[ ) . These can be simplified to remove some redundancy 
as follows: 


no ^ ailog® (m^+T") 


3max{a,/3, A} 


max{-^. A/ns} 


> 


ai log® (m^+^) 


p ^ gf log^ (m^) 


(41) 


a/n\ 


max{-^, —1=} 


al log'^ (m^+^) 


This is due to the fact that 03 > 02 > ai and m® > m^. 
Therefore, subject to ( |41[ ), all totally positive polynomials 
Yo,Ys, Di, Ds, Si,Ti,T 2 concentrate within a multiplicative 
factor of (lie) with probability at least 1 — 0 ( 777 )- 
Under the above concentration result, let the deviations 
of Xi’s be denoted by SXi. Now we calculate the deviation 
of Xo using 1 ^. 


5Xo < eE[yo] i (|E[5'il| + HDi]\ + |E[ri]|) 

i el^ (|E[D2]| i |E[r2]|) - el^|E[y3]| 

< e(no i ni i Sns + 7ns) 

< 7e(no i ni i n 2 i ns). 

Similarly for other X/s, we get 

SXi < 12e (ni i n 2 i ns) 

< 5 X 2 < 6e {ns + ns) 

SXs < ens. 


Therefore, sampling every edge independently with prob¬ 
ability p satisfying all conditions in ( |4l[ ), all X(s concentrate 
within an additive gap of (1 i 12 e) (' g') with probability at 
least 1 — The constants in this proof can be tightened 
by a more accurate analysis. 













































